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Abstract 

Background: Streptococcus gallolyticus subsp. gallolyticus is an important causative agent of infectious endocarditis, 
while the pathogenicity of this species is widely unclear. To gain insight into the pathomechanisms and the 
underlying genetic elements for lateral gene transfer, we sequenced the entire genome of this pathogen. 

Results: We sequenced the whole genome of 5. gallolyticus subsp. gallolyticus strain ATCC BAA-2069, consisting of 
a 2,356,444 bp circular DNA molecule with a G+C-content of 37.65% and a novel 20,765 bp plasmid designated as 
pSGGI. Bioinformatic analysis predicted 2,309 ORFs and the presence of 80 tRNAs and 21 rRNAs in the 
chromosome. Furthermore, 21 ORFs were detected on the plasmid pSGGI, including tetracycline resistance genes 
telL and tet(0/W/32/0). Screening of 41 5. gallolyticus subsp. gallolyticus isolates revealed one plasmid (pSGG2) 
homologous to pSGGI. We further predicted 21 surface proteins containing the cell wall-sorting motif LPxTG, 
which were shown to play a functional role in the adhesion of bacteria to host cells. In addition, we performed a 
whole genome comparison to the recently sequenced 5. gallolyticus subsp. gallolyticus strain UCN34, revealing 
significant differences. 

Conclusions: The analysis of the whole genome sequence of S. gallolyticus subsp. gallolyticus promotes 
understanding of genetic factors concerning the pathogenesis and adhesion to ECM of this pathogen. For the first 
time we detected the presence of the mobilizable pSGGI plasmid, which may play a functional role in lateral gene 
transfer and promote a selective advantage due to a tetracycline resistance. 




Genomics 



Background 

Streptococcus gallolyticus subsp. gallolyticus (formerly 
known as S. bovis biotype I) is a gram-positive bacter- 
ium belonging to the Lancefield Group D streptococci. 
Over the last ten years, the classification of S. gallolyti- 
cus subsp. gallolyticus has been revised several times 
[1-4]. S. bovis was previously divided into three bio types, 
designated as biotype I, biotype II/ 1, and biotype II/2. 
The majority of isolates associated with human endocar- 
ditis have been assigned to biotype I, which was recently 
reclassified as Streptococcus gallolyticus subsp. gallolyti- 
cus [5]. Furthermore, S. gallolyticus subsp. gallolyticus is 



* Correspondence: jdreienahdz-nrw.de 

1 1nstitut fur Laboratoriums- und Transfusionsmedizin, Herz- und 
Diabeteszentrum Nordrhein-Westfalen, Universitatsklinik der Ruhr-Universitat 
Bochum, GeorgstraBe 11, 32545 Bad Oeynhausen, Germany 
Full list of author information is available at the end of the article 



a common member of the microflora and appears in 
approximately 2.5 to 15% of the gastrointestinal tract of 
healthy human [6,7]. It is an opportunistic human 
pathogen which can cause several bacterial infections, 
including septicemia and endocarditis. Over the last few 
years, the percentage of cases of endocarditis caused by 
group D streptococci has significantly increased [8-10]. 
Recently, Russel et al. estimated that S. gallolyticus 
subsp. gallolyticus is the causative agent in 24% of strep- 
tococcal endocarditis cases [11]. In addition, several stu- 
dies present strong correlations between appearance of 
colon neoplasms and 5. gallolyticus subsp. gallolyticus 
infection [7,12], while the underlying pathomechanisms 
are still unknown. Sillanpaa et al. suggest that premalig- 
nant and malignant lesions in the intestinal tract could 
facilitate translocation of S. gallolyticus subsp. gallolyti- 
cus through the disrupted mucosal barrier and provide 
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access to blood circulation [13]. Furthermore, studies 
have suggested a linkage between inflammation by S. 
bovis and colon carcinogenesis [14]. In addition, a vari- 
ety of animal infections, such as mastitis, septicemia in 
poultry, lactic acidosis and infections of various rumi- 
nant animals are caused by S. gallolyticus subsp. galloly- 
ticus [15-17]. However, the exact pathomechanisms of S. 
gallolyticus subsp. gallolyticus or S. bovis infections 
remain unclear. 

5. gallolyticus subsp. gallolyticus shares its environ- 
ment with numerous other potentially pathogenic bac- 
teria, such as S. agalactiae, Enterococcus faecalis or 
others. This implies the possibility of horizontal gene 
transfer of antimicrobial resistance genes or genomic 
islands, e.g. phage-related clusters, by transposons, plas- 
mids or phages, within the human gut or the animal 
rumen [18]. Several studies have reported the occur- 
rence of competence-stimulating peptides in S. bovis 
[19]. These factors facilitate the acquisition of novel 
genes, resistance islands or virulence-associated regions 
[20], in particular when several species coexist within 
biofilms [21]. Recently we were able to show the cap- 
ability of biofilm formation on polystyrene surfaces for 
S. gallolyticus subsp. gallolyticus [22]. Nevertheless, most 
of the mechanisms of transfer and insertion are poorly 
understood [23,24]. 

In vitro studies have demonstrated the adhesion and 
invasion of S. gallolyticus subsp. gallolyticus to extracel- 
lular matrix proteins [22,25], virulence associated pro- 
teins [13,26,27], as well as EA.hy926 or HUVEC cells 
[22]. Furthermore, studies have addressed biosynthesis 
of capsular polysaccharides [28] and fimbriae-like struc- 
tures on the bacterial surface in S. gallolyticus subsp. 
gallolyticus [29]. It has been demonstrated that S. gallo- 
lyticus subsp. gallolyticus has 11 cell wall-anchored pro- 
teins with "microbial surface component recognizing 
matrix molecules" (MSCRAMM) characteristics, includ- 
ing a collagen-binding adhesin and proteins with simila- 
rities to pilus subunits [13]. 

Recently, Rusinok et al. published the first whole gen- 
ome sequence of S. gallolyticus subsp. gallolyticus strain 
UCN34 and analyzed the main metabolic and cell sur- 
face features, particularly with regard to adaptation to 
the rumen and the virulence association of polysacchar- 
ide capsule, glucan mucopolysaccharides, different types 
of pili and collagen binding proteins [30]. 

Here we present the whole genome sequence of a not 
described, considerably divergent S. gallolyticus subsp. 
gallolyticus strain. The strain under study was the tetra- 
cycline resistant strain ATCC BAA-2069, isolated from 
a patient with infectious endocarditis. We demonstrate 
the occurrence of a previously undescribed plasmid 
(pSGGl) which carries genes for tetracycline resistance 
(tetL, tet(0/W/32/0)) and reveals strong sequence 



similarities to plasmids and chromosomes from several 
ruminal and gastrointestinal bacteria, indicating that 
pSGGl may act as a native carrier for horizontal gene 
transfer. 

Results 

General genome properties 

The whole genome sequence of S. gallolyticus subsp. 
gallolyticus was determined by pyrosequencing using the 
454 GS FLX Titanium technique (Roche, Mannheim, 
Germany) and, after assembly of the 454 reads, remain- 
ing gaps were closed by PCR and conventional Sanger 
sequencing. The genome contains a 2,356,444 bp circu- 
lar DNA molecule with a G+C-content of 37.65% and a 
previously undescribed 20,765 bp plasmid designated as 
pSGGl. Mapping of gene set was performed against S. 
gallolyticus subsp. gallolyticus genome UCN34 (Gen- 
Bank Acc. No.: FN597254) [30]. Bioinformatic analysis 
predicted 2,309 open reading frames (ORFs), the pre- 
sence of 80 tRNAs and 21 rRNAs in the chromosome, 
as well as 21 ORFs on the plasmid pSGGl. 

The size of the BAA-2069 circular chromosome 
(2,356,444 bp) exceeds the average of other previously 
published streptococcal genomes by 12% (mean: 2.1 mb; 
n = 15) (Table 1, Figure 1). Direct comparison shows 
that only the S. sanguinis SK36 genome is larger 
(2,388,435 bp), and the G+C-content is 1.7% lower than 
average (range from 35.3 to 43.4%; n = 15). Altogether 
2,309 ORFs were automatically annotated, which is 10% 
higher than the average of all complete sequenced Strep- 
tococcus genomes (2,107 ORFs). In direct comparison to 
the S. gallolyticus subsp. gallolyticus genome UCN34, 
the BAA-2069 genome is 5.5 kb larger (2,356,444 to 
2,350,911 bp), has 70 fewer CDS (2,309 to 2,239) and 
contains the 20,765 bp plasmid pSGGl. 

The sequences and annotations of chromosome and 
plasmid have been deposited at the NCBI GenBank 
(Acc. No. FR824043, FR824044). 

Comparative genomics 

In a direct comparison of genome BAA-2069 to UCN34, 
we noted various ORFs and regions inserted or deleted 
scattered along the genomes; nonetheless the majority 
of genetic information is shared by both strains. The 
BAA-2069 genome contains 2040 (87%) ORFs which are 
predicted to be common in BAA-2069 and UCN34. The 
arrangement of genetic information is very similar over- 
all, based on alignment of the genomes and the synteny 
plot (Figure 2, Additional file 1: Figure SI). The compar- 
ison of the BAA-2069 genome with UCN34 showed 
about 224 kb (9.5%) of unmatched genetic information. 
In the UCN34 genome, 199 (9%) unique genes are pre- 
sent, the BAA-2069 genome contains 269 (12%) unique 
or weak similar genes. There are numerous strain- 
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Table 1 Comparison of Streptococcus! Enterococcus species with S. gallolyticus subsp. gallolyticus BAA-2069 
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specific regions with functional genes originated by 
genetic evolution or lateral gene transfer (LGT). Due to 
the high number of genomic differences, we focused on 
genes and regions relating to putative virulence-asso- 
ciated functions or genes affected by habitant adapta- 
tion. All unique genes and corresponding islands 
calculated by EDGAR analysis are summarized in 



Additional file 2: Table SI (BAA-2069) and Additional 
file 3: Table S2 (UCN34) 

Comparison of whole chromosome sequences by 
MAUVE software [31] reveals an alignment consisting 
of 13 local collinear blocs (LCB) (Figure 2). No signifi- 
cant inversions or displacements of large regions 
between the S. gallolyticus subsp. gallolyticus genomes 
of BAA-2069 and UCN34 were obvious. Regions with 




<p & & 



Figure 1 Distribution of whole genome characteristics. Black dot represents S. gallolyticus subsp. gallolyticus strain Isolate BAA-2069. Black 
square represents S. gallolyticus subsp. gallolyticus strain UCN-34. Symbols represent genomes of S. agalactiae A909, S. dysgalactiae subsp. 
equisimilis GGS_124, S. equi subsp. equi 4047, 5. sanguinis SK36, 5. suis BM407, S. uberis 0140J, 5. pyogenes MGAS9429, 5. pneumoniae ATCC 700669, 
5. mutans NN2025, 5. mitis B6, 5. thermophilus LMD-9, 5. gordonii str. challis substr. CHI, 5. oralis ATCC 35037, 5. salivarius SK126. 
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Figure 2 Whole genome alignment of the two strains of S. gallolyticus subsp. gallolyticus Representation of 13 local collinear blocks (LCB) 
between chromosomal sequences of the 5. gallolyticus subsp. gallolyticus BAA-2069 and UCN34 was generated by MAUVE 2.3.1 software [31] 
with a minimum weight of 355. Sequence of BAA-2069 (top) is the reference against UCN34. The connecting lines between blocks indicate the 
location of each block in the two genomes. Each colored block represents a homologous region without rearrangements, although white areas 
within blocks are strain-specific. 



low similarity to the corresponding genome occur fre- 
quently and their distribution is almost random, 
although the region from base 2,117,000 bp to the end 
of the genome seems to be more conserved. 

Virulence factors 

The BAA-2069 genome contains a 34 kb unique inser- 
tion comprising 35 ORFs (SGGBAA2069_c20310- 
c20660), including the putative major cell surface adhe- 
sin pac. This gene is a major colonization factor in S. 
mutants [32] and may play a similar role in BAA-2069, 
in addition, it has a 84% similarity to a gene in UCN34 
(Gallo_1675). Almost identical to this region is another 
30 kb large section in the BAA-2069 genome 
(SGGBAA2069_cl3640-cl3980). Both described genetic 
islands could be functionally virulence-associated, com- 
prising several proteins for cell adhesion and other viru- 
lence-determining factors. 

In addition, we found a unique 23 kb genetic island in 
the BAA-2069 genome, coding for bacteriocin-associated 
genes (SGGBAA2069_c00810-c00960). This region con- 
tains genes for lanthionine biosynthesis and for a bacter- 
iocin/lanthionine exporter orthologous to genes 
described in S. mutants and S. ratti. Lanthionine is a 
lantibiotic (bacteriocin), a unique class of peptide anti- 
biotic substances [33]. Conducting an agar overlay 
experiment, we revealed an inhibited growth of Lacto- 
coccus lactis, resulting in a zone of clearing around 
BAA-2069 (data not shown). 

Three genes (SGGBAA2069_ c05730, cl2530, cl7410) 
are partly homologue to hemolysin A, hemolysin III and 
an undefined hemolysin-like protein, although group D 
streptococci are usually non-hemolytic or eventually dis- 
play weak alpha hemolysis. Moreover, BAA-2069 does 



show alpha-hemolysis on Schaedler Agar with 5% sheep 
blood. 

The polysaccharide capsule coding region, contains 12 
genes {cpsA - q?SiWSGGBAA2069_c09190 - c09300). 
The genes are located in a 13.5 kb region and are identi- 
cal to the UCN34 genome. 

Comparison of surface proteins 

We predicted 21 proteins with C-terminal LPxTG motif 
by in silico analysis. Additionally, we found orthologous 
or similar genes to all the proteins with MSCRAMMS 
characteristics described by Sillanpaa et al. regarding the 
5. gallolyticus subsp. gallolyticus TX20005 genome 
{"Sbs" genes) and to genes mentioned by Rusinok et al. 
regarding the UCN34 genome ("Gallo_"-genes) [13,30]. 
All genes with the LPxTG motif and their best hits in 
related genomes are listed in Table 2. 

Within the analysis, we found three proteins contain- 
ing the LPxTG motif carried by genomic islands specific 
to strain BAA-2069. The gene SGGBAA2069_cl3880 
and its paralog SGGBAA2069_c20560 have only very 
weak similarities to Gallo_1675 and code for a putative 
major cell surface adhesin (pac). The gene 
SGGBAA2069_cl3900 and its paralog 
SGGBAA2069_c20580 have cell anchor characteristics 
but no similarities to functional genes. Furthermore, the 
unique protein SGGBAA2069_c22120 comprising the 
LPxTG motif is another gene with putative function in 
virulence. 

Protective elements 

In comparison to S. gallolyticus subsp. gallolyticus 
UCN34, the BAA-2069 holds two more restriction 
enzyme genes. The type III enzyme SthIR 
(SGGBAA2069_cl0290) is located on a 9.9 kb unique 
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Table 2 Overview and comparison of S. gallolyticus subsp. gallolyticus genes containing the LPxTG DNA motif 
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The indicated percentage in brackets represents the query coverage to the orthologous gene and the identities revealed by blastn (two sequence alignment) 



island (SGGBAA2069_cl0280 - cl0350), together with 
the corresponding restriction-methylation subunit and 
an integrase gene. Another type II restriction endonu- 
clease Eco47II and its modification methylase is encoded 
on a 9.7 kb region (SGGBAA2069-c22460 - c22570). 

Mentionable regions missing in BAA-2069, but pre- 
sent in the UCN-34 genome, are a 46 kb phage-asso- 
ciated region containing a putative phage- associated cell 
wall hydrolase. A "cluster regulatory interspaced short 
palindromic repeats" (CRISPR) element is sited between 
1,507,890 - 1,508,913 bp and containing 16 repetitions 
of a 36 bp consensus sequence. Another 5.6 kb CRISPR 
associated region is sited at 1,515,490 - 1,516,317 bp but 
mostly conserved between the two strains (BAA2069 
1,517,213 - 1,518,237 bp). A unique CRISPR locus for 
BAA-2069 is between 1,515,726 - 1,516,570 bp. Corre- 
sponding cas genes are for BAA-2069 
SGGBAA2069_cl4660 and cl4670 (cas2), cl4670 (casl), 
respectively Gallo_1437, Gallo_1444 {cas2) and 
Gallo_1438, Gallo_1439 (casl) for UCN34. CRISPR data 
of both genomes are also accessible by CRISPRs web 
server http://crispr.u-psud.fr. 

Genome comparison to related species 

To evaluate the genetic distance to related species, a 
direct comparison to the taxonomically most closely 
related species with available whole genome sequences, 
in particular 5. uberis 0140J and S. agalactiae 2603VR 



was conducted. The analysis revealed a core genome 
consisting of 1118 genes common to all three species, 
whereas S. gallolyticus subsp. gallolyticus BAA-2069 has 
804 unique genes (Figure 3). Furthermore, we included 
three Enterococcus faecalis genomes (V583, OG1RF and 
62 [34-36]). Comparison analysis revealed a set of 825 
common genes, including a putative hemolysin A gene 
(SGGBAA2069_c05730), a fibronectin/fibrinogen bind- 
ing protein (SGGBAA2069_c08170) and a sortase A 
gene (SGGBAA2069_clll50) which could have a possi- 
ble conserved role in virulence (Additional file 4: Table 
S3). A complete list of common or unique ORFs in 
comparison to BAA-2069, considering all known Strep- 
tococcus genomes, is shown in Additional file 5: Table 
S4. Furthermore, a taxonomic analysis based on align- 
ment of core genomes was performed (Figure 4). The 
calculation includes the total number of coding 
sequences common to all analyzed species [37]. The 
revealing phylogenetic tree indicates a huge genomic 
diversity between S. gallolyticus subsp. gallolyticus and 
related whole genome sequenced species. 

Plasmid 

A plasmid designated as pSGGl was identified by 
sequence analysis and later isolated from 5. gallolyticus 
subsp. gallolyticus BAA-2069 (Figure 5). The plasmid 
pSGGl consists of 20,765 bp and contains 21 ORFs, of 
which 14 genes code for proteins with similarities to 



Hinse et al. BMC Genomics 201 1, 12:400 
http://www.biomedcentral.eom/1 471 -2 1 64/1 2/400 



Page 6 of 13 



r \ 

S. gallolyticus subsp. gallolyticus BAA-2069 




S. uberis 01 40 J 



Figure 3 Venn diagram of different streptococcal genomes. Venn diagram revealed by EDGAR analysis [37]. Numbers in intersections 
represent the number common to two or three species. Venn diagram representing common and strain-specific genes of 5. gallolyticus subsp. 
gallolyticus BAA-2069, 5. uberis 0140J and 5. agalactiae 2603V_R. 



sequence databases including the tetracycline resistance tetracycline resistance gene tet(0/W/32/0), which are 
gene tetL (SGGBAA2069_p00110) and the mosaic common in plasmids of gram-positive pathogens. Two 
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Figure 4 Phylogenetic tree of different streptococcal genomes. Tree was calculated by alignment of core genomes. Non-matching parts of 
the alignment were masked and subsequently removed. For further calculation details see material and methods. 
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Figure 5 Plasmid map of pSGGI 

squared boxes. 



Plasmid pSGGI isolated from 5. gallolyticus subsp. gallolyticus BAA-2069. Unique regions are marked by 



insertion sequence IS1216 elements and a putative resol- 
vase and a relaxase gene were identified. The relaxase 
gene has similarities to plasmid pTet35 from Campylo- 
bacter jejuni subsp. jejuni 81-176, which suggests its 
classification of the conjugative transfer system in clade 
MOB P4 . Although, it is more likely that it belongs to the 
MOB v cluster, which is still ancestrally related to MOB P 
[38]. The replication is probably regulated by one of 
four putative rep elements, belonging to rep_l super- 
family (SGGBAA-2069_p00100) and rep_3 superfamily 
(SGGBAA-2069_p00020, p00140, p00200). The repA 
(SGGBAA-2069_p00140) element has 78% sequence 
identity to that of the cryptic plasmid pSBOl isolated 
from S. equinus [39]. However, we were not able to 
determine the functional rep gene by in silico analysis. 
The plasmid pSGGI seems to be incapable of conjugal 
self-transfer since it encodes no tra protein and only a 
putative resolvase, although it was not tested experimen- 
tally. Moreover, a mobilization region orthologous to a 
mob gene in Streptococcus ferrus was found 
(SGGBAA2069_p00200), which is a necessary feature 



for transmissible plasmids and therefore promotes the 
ability for LGT transfer in presence of a helper conjuga- 
tive plasmid. Five ORFs were assigned to encode pro- 
teins with unknown functions and no significant 
sequence similarities to previously described genes exist 
in these cases (Figure 5). The analysis of sequence iden- 
tity to other plasmids or genomes reveals a mosaic-like 
structure representing a high number of similarities with 
common habitants of the rumen or the gastrointestinal 
tract, including different streptococcal species as well as 
Enterococcus and others. In particular, the tetracycline 
resistance genes, which are very common among strep- 
tococci, are partly identical among many different plas- 
mids and species, although no similarities in 
arrangement of resistance genes were observed. To eval- 
uate the distribution of pSGGI among strains of S. gal- 
lolyticus subsp. gallolyticus with different origin (animal, 
strain collections and human samples), we screened 41 
strains by Southern blot hybridization analysis with a 
digoxygenin nick-labeled probe of pSGGI (Figure 6). 
We identified and isolated a plasmid (pSGG2) mainly 
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Figure 6 Southern blot analysis of 8amH/-digested plasmids from two S. gallolyticus subsp. gallolyticus strains Total DNA was digested 
with BamHI and hybridized against a probe consisting of DIG-1 1-UTP-labeled pSGG1 plasmid DNA. Lane 1 : S. gallolyticus subsp. gallolyticus strain 
010672 genomic DNA. Lane 2: 5. gallolyticus subsp. gallolyticus strain BAA-2069 genomic DNA (positive control). Lane 3: Plasmid DNA of pSGGI. 
M: DIG DNA Molecular Weight Marker VII, DIG-labeled (Roche, Mannheim, Germany). 



homologous to pSGGI in another strain (isolate 
010672), originally isolated from the blood culture of a 
patient with infectious endocarditis. The restriction frag- 
ment analysis of pSGG2 revealed a partially different 
pattern in comparison to pSGGI, indicating sequence 
variation between both plasmids (Additional file 6: 



Figure. S2). In further experiments we sequenced the 
pSGG2 plasmid and revealed only differences in non- 
coding regions (data not shown). 

In order to analyze whether the frequency and pheno- 
type of tetracycline resistance of strain BAA-2069 is 
coincident with the presence of pSGGI, we screened 41 
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S. gallolyticus subsp. gallolyticus strains for presence of 
tetL and mob genes by PCR. Additionally, we performed 
a tetracycline susceptibility test. The epidemiological 
cut-off for the WT of related streptococci is < 1 ug/mL 
http://eucast.org. About 42% of strains were growth- 
inhibited by a tetracycline concentration between 0.5-1 
ug/mL, and 95% of strains tested were unable to grow 
at concentrations higher than 256 ug/mL. The two 
strains which showed a tetracycline MIC value of 512 
ug/mL carrying the pSGG plasmid, and only these were 
screened positively for tetL and mob genes (Additional 
file 7: Figure. S3). 

Discussion 

The present study describes the full genome sequence of 
S. gallolyticus subsp. gallolyticus BAA-2069 and the 
comparison to related genomes in order to evaluate pos- 
sible virulence-associated characteristics of this species. 
Previous publications have shown a significant diversity 
in adhesion and invasion potential for binding to 
endothelial host cells, as well as binding to ECM pro- 
teins in vitro [22]. Other studies have shown that viru- 
lence gene profiles are associated with disease [40]. 
Therefore genomic comparison analysis provides the 
basis for understanding pathogenicity. 

Within whole genome comparison analysis the "pan- 
genome" includes a core genome containing genes pre- 
sent in all strains of one species. This is complemented 
by an individual set of genes unique to a strain or not 
shared by all strains. With the growing number of 
sequenced strains, the increasing size of the pan-genome 
is evidence of the genomic diversity between different 
isolates of a distinct species. Tettelin et al. have shown 
that in the case of Streptococcus agalactiae the core gen- 
ome of eight strains comprises about 80% of genes of 
any single genome, and exploration of data reveals that 
the gene reservoir is immense [41], whereas in the case 
of Bacillus anthracis the number of strain-specific genes 
after addition of the fourth strain was zero [42]. The 
number of strain-specific regions in the two analyzed S. 
gallolyticus subsp. gallolyticus strains is, in contrast to S. 
agalactiae strains (average 7.27%, maximum -10%), 
about 3.5% higher. This could be taken as a hint that, 
with the increasing number of sequenced strains, the 
pan-genome of 5. gallolyticus subsp. gallolyticus is far 
larger by proportion. However, these data are prelimin- 
ary, pending the sequencing of further S. gallolyticus 
subsp. gallolyticus strains. 

In a direct comparison to the recently sequenced 
strain UCN34 [30], surprisingly many unique genes with 
putative virulence associated characteristics are present 
in each strain, which could be an indication that the 
pathogenicity of S. gallolyticus subsp. gallolyticus is very 
diverse. The majority of exclusive sequences found in 



the UCN34 genome are located in three large regions 
representing 111 kb of sequence information (53%), 
whereas the three largest unique regions in BAA-2069 
constitute only 87 kb (39%) of strain-specific sequence 
and mostly consist of smaller regions. However, the ten- 
dency of virulence factors to be located within genomic 
islands may lead to a higher ratio of exchangeability of 
such genes in comparison to other regions [43]. Further- 
more, additional restriction enzymes in BAA-2069 may 
have a function in protection against viral DNA and 
heritable CRISPR elements are able to mediate immu- 
nity against phages and be transmitted to other organ- 
isms by genetic transformation events [44]. 

Surface proteins and in particular proteins belonging 
to "microbial surface component recognizing matrix 
molecules" (MSCRAMM) were shown to play a func- 
tional role in the pathogenesis of all bacteria. Of specific 
interest is a group of proteins containing the C-terminal 
cell wall-sorting motif LPxTG, which serves as a recog- 
nition site for the membrane-associated sortase. After 
sortase-mediated cleavage of the protein, the polypeptide 
is covalently bound to the peptidoglycan of bacterial cell 
surface and can therefore promote the first step in bac- 
terial adherence [45,46]. Three of the 21 predicted 
LPxTG motif genes are unique for BAA-2069 and 
further studies are required to evaluate their contribu- 
tion to pathogenicity. 

In silico analysis of genome data strongly indicated the 
presence of a multi-copy plasmid. The purification of 
plasmid DNA and further analysis of sequence data con- 
firmed these hints and showed a localization of tetracy- 
cline resistance genes. Analysis of plasmid distribution 
shows only two mainly homologous plasmids in 41 
strains overall. Therefore, the incidence of the pSGG 
plasmids among S. gallolyticus isolates does not seem to 
be widespread. The mosaic tetracycline resistance gene 
tet(OZW/32/0) is usually chromosomally located and 
mediates resistance by ribosome protection. It has been 
shown that the mosaic tet(OZW) genes have a higher 
level of resistance than the original genes [47]. This 
could be verified by our experimental data, showing the 
strains carrying the pSGG plasmid have the highest 
resistance levels. The tetL gene is generally found on 
plasmids and coding for a tetracycline efflux protein 
[48]. In contrast to the BAA-2069 strain, the tetracycline 
resistance of strain UCN34, mediated by tetL and tetM, 
was located on the chromosome and adjacent to puta- 
tive plasmid and transposon Tn9i6-associated genes 
[30]. This indicates a strong dependence between high 
tetracycline resistance mediated by tetL and the occur- 
rence of plasmids of the pSGG family. 

Because of antibiotic treatment, gastrointestinal tract 
and rumen are well-known reservoirs of mobilizable 
antibiotic resistance genes [49]. Furthermore, the 
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transfer of antibiotic resistance across several species 
and genera between commensal bacteria is well known, 
and habitants with a dense population and, in particular, 
the ability to form biofilms, are optimal for genetic 
transfer [50]. Especially because, S. gallolyticus subsp. 
gallolyticus is a commensal and facultative pathogen of 
animals, the intensive tetracycline treatments in animal 
husbandry, causes a general advantage regarding evolu- 
tionary fitness for pathogenic and natural habitants of 
the intestinal tract to accommodate resistance genes by 
LGT [51,52]. Although the plasmid pSGGl is incapable 
of conjugal self-transfer, it is mobilizable by a helper 
conjugative plasmid. These findings suggest that it may 
play a functional role in LGT between different strepto- 
coccal groups and further related species. However, the 
detection of only two plasmids out of 41 strains is so far 
not evidence of LGT, but further screening of a huge 
variety of strains in combination with epidemiological 
studies should help to evaluate the role of pSGG 
plasmids. 

Conclusion 

This study presented the analysis and comparison of the 
whole genome sequence of S. gallolyticus subsp. galloly- 
ticus strain BAA-2069, a causative agent of infective 
endocarditis. The results promote identification of 
genetic factors concerning the pathogenesis and adhe- 
sion to ECM. Novel candidate genes were detected 
probably contributing to the pathogenicity. The compar- 
ison to S. gallolyticus subsp. gallolyticus strain UCN34 
revealed significant differences concerning virulence fac- 
tors, surface proteins and protective elements. 

Furthermore, we detected for the first time the pre- 
sence of the pSGGl plasmid, containing 21 ORFs 
including mosaic tetracycline resistance genes and may 
play a functional role in lateral gene transfer. 

Methods 

Bacterial strains, growth conditions, nucleic acid 
extraction 

The S. gallolyticus subsp. gallolyticus strain was isolated 
in 2003 at the Herz- und Diabeteszentrum Nordrhein- 
Westfalen from a blood culture from a 68-year-old 
woman with aortic heart valve endocarditis and depos- 
ited at the American Type Culture Collection (ATCC, 
Manassas, USA) (BAA-2069). Strain BAA-2069 was con- 
firmed by isolation of the same strain by lesion smear 
test of aortic heart valve and detection in valve excision 
material by culture and PCR. The strain was selected 
because it had been defined as virulent during earlier 
tests [22] and shows phenotypic resistance against oxa- 
cillin, tobramycin, co-trimoxazole, colistin, metronida- 
zole and tetracycline and intermediate resistance against 
gentamycin (minimal inhibitory concentration (MIC) 8 



ug/mL). Isolate 010672 with plasmid pSGG2 was iso- 
lated in 2001 at the Herz- und Diabeteszentrum Nordr- 
hein-Westfalen from a blood culture from a 62-year-old 
man with infectious endocarditis with no obvious con- 
nection to the origin of strain BAA-2069. For genomic 
DNA isolation, cells were grown for 12 h in Brain Heart 
Infusion Broth (BHI) (Oxoid, Hampshire, United King- 
dom) at 37°C, 200 rpm. DNA extraction was performed 
by the Hopwood alkaline lysis method [53]. 

Genome sequencing, assembly and gap closure 

DNA sequencing was performed using 454 Life Science 
pyrosequencing technology [54], GS-FLX Titanium pro- 
duced 455,842 reads of average 329 bp. The reads were 
assembled using Newbler V2.3, resulting in 38 contigs 
with 31 contigs larger than 500 bp. The large contigs 
obtained with 64.9 x coverage served as the basis for the 
gap closure. Gap closure was performed by custom pri- 
mer walking with long range PCR (using Phusion poly- 
merase, New England Biolabs, Frankfurt (Main), 
Germany) and subsequent Sanger sequencing, resulting 
in 62 reads in total (IIT Biotech, Bielefeld, Germany). 
Long repeat structures (copies of the rrn operon and 
two repeats of 17.4 and 5 kbp respectively) were 
resolved by introducing fake reads based on the consen- 
sus sequence. 

Genome annotation 

Curation and annotation of the genome were performed 
using the genome annotation system GenDB 2.4 [55]. 
Prediction of coding sequences (CDS) was accomplished 
using Critica [56], Glimmer [57] and Reganor [58]. All 
predicted ORFs were automatically submitted to similar- 
ity searches against nr, Swissprot, KEGG, InterPro, Pfam 
and TIGRfam databases. Putative signal peptides, trans- 
membrane helices and nucleic acid binding domains 
were predicted using SignalP [59], TMHMM [60] and 
Helix-Turn-Helix [61], respectively. The automatic 
annotation of each CDS was manually checked and cor- 
rected according to the most congruent tool results. 

Genome analysis 

5. gallolyticus subsp. gallolyticus BAA-2069 gene content 
was compared to S. gallolyticus subsp. gallolyticus 
UCN34, S. agalactiae A909, S. dysgalactiae subsp. equi- 
similis GGS_124, S. equi subsp. equi 4047, 5. sanguinis 
SK36, S. suis BM407, S. uberis 0140 J, S. pyogenes 
MGAS9429, S. pneumoniae ATCC 700669, S. mutans 
NN2025, S. mitis B6, S. thermophilus LMD-9, S. gordonii 
str. challis substr. CHI, S. oralis ATCC 35037, S. sali- 
varius SK126 with EDGAR [37], which defines ortholo- 
gous proteins based on bidirectional best blast hit and 
then calculates BLASTP score ratio values (SRV). Para- 
logous genes might be discarded during the analysis. For 
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each comparison, SRV distribution was fitted with 
binominal or bibeta distribution with a self written R 
script, and a cutoff was determined at the point where 
the probability of belonging to one or other peak is 
equal. Accordingly, a general cutoff of 0.21 was used to 
retrieve the core genes and singletons. LPxTG-related 
proteins were searched by screening for [LYF]P[TSA] 
[GANS] motif and using of an LPxTG Hidden Marcov 
Model for sortase substrates created by Boekhorst et al. 
[46]. 

Comparison of whole chromosome sequences 

Comparison of whole chromosome sequences was per- 
formed by MAUVE software using local collinear blocs 
(LCB). An LCB is defined as a collinear (consistent) set 
of multi-MUMs (exact match subsequences shared by 
all the considered chromosomes that appear once in 
each chromosome and are bordered on both sides by 
mismatched nucleotides). The weight (the sum of the 
lengths of the included multi-MUMs) of an LCB serves 
as a measure of confidence that it is a true orthologous 
region instead of a random match and is set to 355. 
Therefore, the ORFs or sequences between the LCBs 
and any regions with low similarity (shown as white in 
LCB) are classified as strain-specific regions. 

Calculation of phylogenetic tree 

For calculation of phylogenetic tree, EDGAR was used 
[49]. In detail this means that, for this project compris- 
ing 25 genomes 300 core genes (orthology-cutoff 35% 
Score Ratio Value) of these genomes are computed. In a 
next step alignments of the core genes are generated 
using MUSCLE, non-matching parts of the alignment 
are masked by GBLOCKS and subsequently removed. 
The remaining parts of all alignments are concatenated 
to one huge alignment. Based on this alignment, a dis- 
tance matrix is calculated using the Kimura algorithm, 
which is used as input for the neighbor joining method 
(both algorithms realized in the PHYLIP package). This 
leads to a phylogenetic tree, represented in newick 
format. 

GC skew analysis 

The GC skew measures the excess of Gs by calculating 
the difference between the number of Gs and Cs (G-C) 
in a sliding window of 1000 nucleotides. The skews 
were cumulated to obtain the cumulative GC skew that 
represents the sum of the GC skews from the first to 
the i th window. 

Plasmid screening 

Screening of 41 different S. gallolyticus subsp. gallolyti- 
cus strains for presence of pSGGl plasmid or homologs 
was performed by Southern-hybridization analysis in 



accordance with standard protocols. The probe was pre- 
pared by nick translation DIG labeling of pSGGl refer- 
ring to DIG DNA Labeling Kit (Roche Diagnostics, 
Mannheim, Germany) [62]. Furthermore all strains were 
screened for the presence of tetL gene by PCR using the 
whole genome sequence derived primer tet_f (5'- 
GCTATGGGAGAAGGTATCG-3') and tet_r (5'- 
GAGACAAACCCTGCTACTG-3'), or mob_f (5'- 
AAGCTGTACTTGGCTCTC-3') and mob_r (5'- 
CAGTGGCAGAACTATCTC-3') respectively, by stan- 
dard methods. 

Nucleotide sequence accession number 

Whole genome sequence of S. gallolyticus subsp. galloly- 
ticus was deposited at GenBank (Acc. no. FR824043). 
Sequence of the novel designated plasmid pSGGl was 
deposited with accession no. FR824044. 

Tetracycline susceptibility testing 

For each strain, 200 uL BHI broth (Oxoid, Cambridge, 
UK) supplemented with indicated tetracycline concen- 
tration were inoculated with 1 uL of overnight culture 
of S. gallolyticus subsp. gallolyticus strains and cultivated 
in 96 well plates at 37°C. After 16 h incubation, OD 600 
was measured and growth was determined as OD 600 > 
0.2. The assay was performed in triplicate. 

Additional material 



Additional file 1: Pairwise synteny plot of the S. gallolyticus subsp. 
gallolyticus BAA-2069 and UCN34 genome Every CDS of the first 
contig is checked for a reziprocal best blast hit. If one is found, the 
stopposition of both CDS are read from the database and used as 
coordinates for a dot. 

Additional file 2: Unique genes of S. gallolyticus subsp. gallolyticus 
BAA-2069 in relation to S. gallolyticus subsp. gallolyticus UCN34 

Unique genes calculated by EDGAR analysis. 

Additional file 3: Unique genes of S. gallolyticus subsp. gallolyticus 
UCN34 in relation to S. gallolyticus subsp. gallolyticus BAA-2069 

Unique genes calculated by EDGAR analysis. 

Additional file 4: Core genome set of S. gallolyticus subsp. 
gallolyticus BAA-2069 and three Enterococcus feacalis strains 

Following strains were used for calculation by EDGAR: E. faecalis 62 (Acc. 
No CP002491), E. faecalis OG1RF (Acc. no. CP002621) and E faecalis V583 
(Acc. no. NC_004668). 

Additional file 5: Number of unique or common ORFs Numbers 
represent the common or unique ORFs in comparison to BAA-2069 and 
indicated species. 

Additional file 6: Agarose gel electrophoresis of restriction 
fragment pattern. Pattern were obtained with seven different enzymes, 
regarding plasmid pSGG2 (left lane) and pSGGl (right lane). Ladder 
marker: 1 kb Ladder plus (Fermentas, St. Leon-Rot, Germany). 

Additional file 7: Tetracycline susceptibility test. Minimum inhibitory 
concentration (MIC) was determined growth in liquid cultures with 
indicated tetracycline concentration. 
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